The Decoupled-Style Prefetch Architecture

نویسندگان

  • Kevin D. Rich
  • Matthew K. Farrens
چکیده

Decoupled processing seeks to dynamically schedule memory accesses in order to tolerate memory latency by prefetching operands. Since decoupled processors can not speculatively issue memory operations, control flow operations can significantly impact their ability to prefetch data. The prefetching architecture proposed here seeks to leverage the dynamic scheduling benefits of decoupled processing while allowing memory operations to be speculatively invoked. The prefetching mechanism is evaluated using the SPEC95 suite of benchmarks and significant reductions in cache miss rate are achieved, resulting in speed-ups of over 40% of peak for most of the inputs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improving the Performance of Loop-Based Programs Using a Prefetch Processor

We present an architecture called the CAPP (Computing And Prefetching Processor). The CAPP provides high performance for loop-based scientific and signal processing programs by improving memory system performance by providing a decoupled prefetch processor. The prefetch processor improves performance by relieving the main processor of prefetching instruction overhead and allowing the prefetch d...

متن کامل

Simplifying Hardware for Out Of Order Execution using the Decoupling Paradigm

Future hardware and software technology will try to provide improved performance by extracting higher levels of parallelism. However the cost of a main memory access-in terms of missed instruction issue slots-increases with faster processors and greater issue widths. For this reason latency hiding technology remains one of the most important parts of high performance processor designs. In this ...

متن کامل

Improving the parallelism and concurrency in decoupled architectures

Concurrency between access and execution has been exploited by queues in many decoupled access-execute architectures, but data dependent control dependencies often prohibit prefetch-ing of data to queues. This paper investigates a technique to facilitate anticipatory loading to queues even in presence of data dependent control dependencies. The proposed method consists of fetching along one or ...

متن کامل

Decoupled Sampling for Real-Time Graphics Pipelines

We propose decoupled sampling, an approach that decouples shading from visibility sampling in order to enable motion blur and depth-of-field at reduced cost. More generally, it enables extensions of modern real-time graphics pipelines that provide controllable shading rates to trade off quality for performance. It can be thought of as a generalization of GPU-style mul-tisample antialiasing (MSA...

متن کامل

A Decoupled Fetch-Execute Engine with Static Branch Prediction Support

We describe a method for supporting static branch prediction on a decoupled fetch-execute pipeline. Using instruction buffers to decouple instruction fetch from the execute pipeline is an effective way to minimize instruction cache penalties by allowing instruction fetch and stall miss handling to proceed independent of the execution pipeline. Dynamic branch prediction is typically used with su...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000